QSAR, homology modeling, and docking simulation on SARS-CoV-2 and pseudomonas aeruginosa inhibitors, ADMET, and molecular dynamic simulations to find a possible oral lead candidate

In seek of potent and non-toxic iminoguanidine derivatives formerly assessed as active Pseudomonas aeruginosa inhibitors, a combined mathematical approach of quantitative structure-activity relationship (QSAR), homology modeling, docking simulation, ADMET, and molecular dynamics simulations were executed on iminoguanidine derivatives. The QSAR method was employed to statistically analyze the structure-activity relationships (SAR) and had conceded good statistical significance for eminent predictive model; (GA-MLR: Q2LOO = 0.8027; R2 = 0.8735; R2ext = 0.7536). Thorough scrutiny of the predictive models disclosed that the Centered Broto-Moreau autocorrelation - lag 1/weighted by I-state and 3D topological distance-based autocorrelation—lag 9/weighted by I-state oversee the biological activity and rendered much useful information to realize the properties required to develop new potent Pseudomonas aeruginosa inhibitors. The next mathematical model work accomplished here emphasizes finding a potential drug that could aid in curing Pseudomonas aeruginosa and SARS-CoV-2 as the drug targets Pseudomonas aeruginosa. This involves homology modeling of RNA polymerase-binding transcription factor DksA and COVID-19 main protease receptors, docking simulations, and pharmacokinetic screening studies of hits compounds against the receptor to identify potential inhibitors that can serve to regulate the modeled enzymes. The modeled protein exhibits the most favorable regions more than 90% with a minimum disallowed region less than 5% and is simulated under a hydrophilic environment. The docking simulations of all the series to the binding pocket of the built protein model were done to demonstrate their binding style and to recognize critical interacting residues inside the binding site. Their binding constancy for the modeled receptors has been assessed through RMSD, RMSF, and SASA analysis from 1-ns molecular dynamics simulations (MDS) run. Our acknowledged drugs could be a proficient cure for SARS-CoV-2 and Pseudomonas aeruginosa drug discovery, having said that extra testing (in vitro and in vivo) is essential to explain their latent as novel drugs and manner of action.


Background
Coronaviruses are separated into four kinds: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus [1]. Many species, including humans, have been shown to suffer respiratory, intestinal, neurological disorders, and hepatic caused by these viruses, particularly Betacoronavirus [2]. The World Health Organization (WHO) named it 2019-novel coronavirus (2019-nCoV) after determining the involvement of coronavirus in COVID- 19 [3] (https:// www. who. int/ emerg encies/ disea ses/ novel-coron avirus-2019). Referable to world health emergencies, the International Committee of Coronavirus Study Group (ICCSG) proposed using the named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) for 2019-nCoV [4]. Because of the onset of pandemic crises around the world, SARS-CoV-2 has now developed a major community health anxiety [5]. The WHO has labeled COVID-19 a community health matter of global concern because of its speedy spreading and ever-increasing procreation/transmission number [6]. As of August 13, 2021, the number of confirmed cases is 205,338,159 and the number of confirmed deaths is 4,333,094 (https:// www. who. int/ emerg encie ss/ disea ses/ novel-coron avirus-2019). During infection with SARS-CoV-2, the amount of Pseudomonas aeruginosa increases, encouraging inflammation by accelerating the recruitment of inflammatory cells and increasing the level of angiopoietin II (https:// www. who. int/ emerg encie ss/ disea ses/ novel-coron avirus-2019). The protease is one of the numerous products of the SARS-CoV-2 binding target [7,8]. Drugs remain the only therapeutic option for Pseudomonas aeruginosa and SARS-CoV-2, despite efforts to create a vaccine [9]. Due to different medication resistance scenarios around the world, the number of people dying annually from Pseudomonas aeruginosa and SARS-CoV-2 is steadily rising [9,10]. Given the lack of viable medicines and the continual growth in transmission numbers and fatality cases. Computeraided drug discovery (CADD) [11] could be a good strategy to discover hit drugs for Pseudomonas aeruginosa and SARS-CoV-2 treatment. This computer-aided drug design and development technique will cut down on the cost and time it takes to find new therapeutic candidates [12]. Ahmad et al. have reported the docking, molecular dynamic simulation, and MM-PBSA studies of Nigella Sativa compounds to find likely normal antiviral drugs for SARS-CoV-2 treatment [13]. Amin and his coworkers have reported the use of Monte Carlo-based QSAR, virtual screening, and molecular docking study of some inhouse molecules as inhibitors of COVID-19 [14]. Several CADD methods have been used to study and design hit drugs such as anticancer [15,16], monoamine oxidase B inhibitors [17], antimicrobial [18], dengue virus [19], and antidiabetic [20] drugs, etc. To select a chemical compound as a viable treatment, the following in silico technique such as quantitative structure-activity relationship (QSAR), molecular docking simulation, absorption, metabolism, excretion, and distribution (ADME), and dynamics modeling of many drugs from known drugs library are used against the target receptors. In the present research, we executed QSAR studies on some chemical libraries using genetic function approximationmultiple linear regression (GFA-MLR). The best model out of the many generated model will be systematically analyzed. The results gained from these methods were equated for validation. Next, we perform the homology modeling of our query protein, then docking simulation to obtain information about the main interaction types from the built model receptor active pocket. Their drug-likeness parameters of the most beneficial docked compound were assessed via in silico approach. Finally, simulations were executed to assess the dynamic stableness of the docked receptors. The current modeling study would offer understanding into the structural demands of these COVID-19 and Pseudomonas aeruginosa inhibitors and may aid in planning novel drugs.

Methods
Density function theory (DFT/B3LYP) with the 6-31G+ (d, p) basis sets in Gaussian 09 were used to thoroughly optimize the geometries of the iminoguanidine derivatives (PubChem database accession number AID_131512). The PaDEL v2.20 program [21] was used to calculate the properties for QSAR analysis. The association between one dependent variable (pMIC 50 ) of 25 compounds and various independent variables was studied using GA-MLR statistical techniques. The genetic approximation (GA) technique which is included in QSARINS v2.2.4 [22] was used to perform multiple linear regression (MLR) analysis of the molecular descriptors. By dividing the database into two groups, a training set to construct the quantitative model and a test set to confirm the proficiency of the molded model. All the minimum inhibitory concentration (MIC) activity data in the experiments were first translated to the negative logarithm of MIC (pMIC 50 = −log10 (MIC)). shows the chemical structures of iminoguanidine compounds as well as their activity levels. To test the internal validity of the regression model, we employed the LOO (leave-one-out) approach [23,24]. This (Q2LOO) is the most frequent way of determining a model's inner prediction ability. We used randomized validation [25] (Q2rand, R2rand), root mean square error of the training set (RMSEc), and coefficient of determination to assess model robustness in addition to (Q2LOO). For external validation, we used Q2F1 [26], Q2F2 [27], and Q2F3 [28], as well as the concordance correlation coefficient (CCC) and root mean square error of prediction (RMSEp) as recommended by the Organization for Economic Cooperation and Development (OECD) [29]. QL2OO > 0.5, R2 > 0.6, 0.85 ≤ k ≤ 1.15 or 0.6, 0.85 ≤ k' ≤ 1.15 [30], Q2F1 > 0.5, Q2F2 > 0.5, Q2F3 > 0.5, and CCC > 0.80 are some of the evaluation criteria.

Homology modeling
To build the initial structure for the molecular docking and MD simulation studies, homology modeling of Pseudomonas aeruginosa and SARS-CoV-2 secondary structure was undertaken.  [31], the coordinates for the query structure were assigned from the template structure using pairwise sequence alignment. MODLOOP Server [32] was used to correct irregular secondary structures. The 3D protein structures were then built using MODELLER 10.1 [33]. As a result, the model with the lowest discrete optimized protein energy (DOPE) score was chosen, and the model was then energy minimized (add hydrogen and Gasteiger charge) using Chimera v1.10.2 software with the AMBER FF14SB force field. SAVES server was used to calculate stereochemical characteristics, the atomic model's (3D) compatibility with its amino acid residues, bond lengths, bond angles, and side-chain planarity were all utilized to verify the model's quality. PROCHECK [34] was used to calculate Ramachandran plots to verify the stereochemical quality of modeled protein structures. Verify3D [35] and ERRAT [36] were used to create an environment profile. WHATIF was used to investigate residue packing and atomic contact, whereas WHATCHECK was utilized to calculate the Ramachandran plot's Z Score [37]. Using PyMOL, the RMSD was calculated by superimposing the 3D modeled protein with the template.

Structure-based virtual screening and docking
To perform molecular docking simulations and virtual screening, we utilized Autodock Vina [38] (Table S1) were used as control drugs against SARS-CoV-2 virus main protease and Pseudomonas aeruginosa proteins, respectively.

Molecular dynamics simulations (MDS)
MDS is a thermodynamic-based procedure that aids in the investigation of dynamic changes encountered in protein-ligand complexes. To certify the integrity of the ligand-protein combination in our investigation, we used MDS to examine the best ligands screened in previous phases with their corresponding proteins. The molecular docking complexes were simulated using the NAMD 2.13 Win64-multicore version [40], which included the Chemistry at HARvard Macromolecular Mechanics (CHARMM 36) force field [41] and the TIP3P water model. Several co-time approaches were applied, with a 2fs integration time step. The CHARMM-GUI web service [42] was used to produce ligand topology and parameter files, produce psf files of protein-ligand complexes, water box, and neutralize the system with potassium (K + ) and chloride (Cl -) ions. The simulation/production (NPT) ran for 1 ns with 5000 steps of minimization (NVT). The temperature was kept constant at 303 K using a Langevin thermostat. The system's perimeter was surrounded by

Results
In the current study, about 1500 descriptors from PaDeL v2.20 using DFT (B3LYP/6-31G+(d,p)) were computed. Descriptors compete for space in the 25 compounds studied; on these descriptors, a genetic approximationmultiple linear regression (GA-MLR) was employed. As a result, all descriptors with a low correlation coefficient value concerning the dependent variable were first discarded. Also, descriptors with a correlation coefficient larger than 0.95 are eliminated from our data matrix to reduce ambiguity. The GA analysis selects the remaining descriptors, which are then employed in the creation of MLR models. QSARINS software v2.2.4 [44,45] was used to divide the entire dataset into training and test sets at random. From the training set, the GA-MLR model with the highest coefficients of determination and explained variance in "leave one out" cross-validation prediction, and reasonable ability to predict MIC 50 values of test set chemicals was chosen. The extended QSAR model is given in the equation below: The more important the regression model, the lower the p-value (Table 1), and all of the descriptors' p-values were less than 0.05, indicating that they were statistically significant at the 95% level. Edache et al. [46] stipulated that the descriptors developed in a QSAR model should not be inter-correlated with one another. If descriptors are heavily connected among themselves, the model will be highly unstable. As a result, the developed model is statistically insignificant if the VIF is developed to evaluate descriptor inter-correlation. The VIF values of both descriptors in this model are 1.23 which are less than the threshold value of 10 [47]. Table 1 shows the parameters utilized in the final model have relatively low inter-correlation based on VIF analysis. The mean effect (MF) value was calculated for each descriptor to determine its relative importance and contribution to the model. ATSC1c is a molecular descriptor based on Centred Broto-Moreau autocorrelation with lag 1/I-state weighting. The descriptor is related to pMIC50 in a good way. It is assumed that increasing the ATSC1c descriptor by 76% boosts the bioactivity of drugs or anti-Pseudomonas aeruginosa activity. The final descriptor is TDB9s, which stands for 3D topological distance-based autocorrelation -lag 9/weighted by I-state. A 24% rise in the value of this descriptor increases the inhibitory activity of a compound.
Internal and external cross-validation was used to assess the model's predictive potential. The model's results, as well as their regression statistics, are presented in Table S2 and S3. Fig. S1 and S2 present the plots of experimental activity versus predicted activity for the training set and the test set compounds, calculated using model 1. Fitting's criteria, internal validation criteria, and external validation criteria values for the model were judged according to the acceptable threshold [48][49][50]. Furthermore, the residual for the predicted pMIC50 values for both the training and test sets are plotted against the experimental pMIC50 values in Fig. S3 and S4. The model did not show any proportional or systematic inaccuracy since the propagation of residuals on both sides of zero is random (Fig. S3). The residuals calculated using prediction by leave-one-out (LOO) (Fig. S4) confirm the claim [51]. Each component's leverage results can be computed and plotted against standardized residuals, allowing for graphical spotting of outliers and influential compounds in a model. The hat matrix (H's) diagonal elements indicate the molecules' leverages, which may be computed using the formula below: where X is the training set matrix and X T denotes the transpose of X. Fig. S5 and S6 show the applicability zone as a squared region defined by a 2.5 bound for residuals and where p signifies the number of model parameters and n constitutes the number of compounds [29]. Fig. S5 shows that the test set's compound 15, a response outlier and compound 16, a structurally influential outlier is outside of this square area. While in Fig. S6 using prediction by leave-one-out (LOO), compounds 15 and 20 of the training and test set with standardized residuals exceeding 2.5 standard deviation units are response outliers. A structurally influential outlier is compound 16 from the test set, which is not within the cut-off value of h* = 0.5.
(2) h * = 3(P + 1)/n  Surprisingly, one of the training sets compounds and two of the validation compounds both had leveraged greater than the threshold value and low residuals. As previously established by Jaworska and coworker [52] present, compounds with hat matrix (H's) greater than h* alleviate the model and make it predictive for new compounds that differ structurally from the training set [53]. This is only true when the training compound residuals are low.
To ensure that all molecules from the estimate set were within the model domain, we used the Insubria graph [54]. The leverages for prediction set vs predicted values are plotted in the graph (Fig. S7). Based on molecular similarity to the training set compounds (leverage value) and the predicted value of pMIC50, we identified the model's reliable prediction zone with this figure. We discovered that 50% of the molecules in the test set fit into the model's applicability zone. Compounds 12, 16, and 18 were discovered to be beyond the zone. To ensure model quality, the Y-scrambling process was used to confirm the absence of chance correlations in the initial GFA-MLR model. As projected, Fig. S8-S10 shows a satisfactory model was obtained.

Homology modeling
Homology modeling is typically used to create protein models and follows a set of well-defined and widely acknowledged procedures [55]. During the homology modeling phase, we aim for an experimentally determined structure with the COVID-19 virus main protease and RNA polymerase-binding transcription factor DksA (plasmid) that has a high "sequence identity. " Chain A, 3C-like proteinase (severe acute respiratory syndrome coronavirus 2) target and template PDB I.D: 5R7Y protein sequences were aligned as indicated in Fig. 1A. The homology model of COVID-19 primary protease in association with carmofur was built using crystal structures of chain A, 3C-like proteinase (PDB: 5R7Y) as a template, and then modified by loop modeling. Figure 1B shows an overview of the aligned template and target sequence's projected 3D structure with the alignment calculated using PyMOL molecular viewer yielded an RMSD value of 0.169. In this investigation, the Discrete Optimized Protein Energy (DOPE) score [56], which is included in the MODELLER package and is extensively used to assess the quality of 3D models. The DOPE score values for the SARS-CoV-2 models are presented in Table 2. Models with a lower DOPE score and high molpdf values were regarded as structurally sound and reliable in terms of energy values. The model with a DOPE score of −36285.0 and a molpdf value of 1550.75635 (model 1) was chosen in the case of the COVID-19 virus. The model and templates were superimposed according to the DOPE score profiles as presented in Fig. 2. The long active site loop between residues 10-50, 100-120, and 280-310, as well as the long helices at the C-terminal and N-terminal ends of the target sequence, has relatively high energy, according to the plotted DOPE score profile. This lengthy loop interaction with the region makes up the active sites.
Different techniques, such as PROCHECK (Ramachandran plot), PROVE, ERRAT2, and VERIFY 3D, were used to assess the 3D model's structural integrity. The modeled protein's Ramachandran plot (Fig. 3A, B) shows that 93.3% (250 aa) of the total residues are in the most  (Fig. 3C) was obtained, and it showed PASS. The ERRAT2 overall quality factor for the COVID-19 model is around 88.26% (Fig. S11A). The overlapping of the structure of transcription factor DksA2 from Pseudomonas aeruginosa and RNA polymerase-binding transcription factor DksA models shows great similarity, possibly due to the homology modeling procedure (Fig. 4A). Ten (10) PDB structures were generated, using MODELLER 10.1, and the best receptor model was chosen based on the DOPE assessment method as presented in Table 3. Figure 4 shows an overview of the aligned template and target sequence's projected 3D structure with the alignment calculated using PyMOL yielded an RMS value of 0.288. The model and templates were superimposed according to the DOPE score profiles as shown in Fig. 5. To evaluate the reliability of RNA polymerase-binding transcription factor DksA models built for docking purposes, we used a Ramachandran plot. These methods identify the Psi/Phi angle distribution in the 3D model within the allowed or disallowed regions. Ramachandran plot (Fig. 6) of the modeled protein represents 94.6% (122 aa) of the total residues in the most favored regions, 3.1% (4 aa) in additionally allowed regions, residues in generously allowed regions is 1.6% (2 aa), and 0.8% (1 aa) residues in disallowed regions, indicating a good quality model. The modeled protein's Verify 3D plot (Fig. 6C) was obtained, and it showed PASS. The ERRAT2 overall quality factor for the RNA polymerase-binding transcription factor DksA model is around 91.667% (Fig. S11B).

Molecular docking simulations
The selected configurations from the docking result are required in molecular docking simulation to determine the theoretical correctness of the produced complex structure between ligand and receptor. The active site of the modeled SARS-CoV-2 proteinase and modeled RNA polymerase-binding transcription factor DksA was docked by all 25 studied compounds and 8 controls or tested drugs. Within the defined active site, the docking program generates several poses with varied placements. The binding affinity score was used to determine the final ranking of the ligand docking postures. The binding affinity score of all the studied compounds and the control  drugs are presented in Table S4. The binding poses of the best ligand and standards with the lowest binding affinity are depicted in 3D and 2D diagrams in Fig. 7. The ligand number 18 has the highest binding affinity against SARS-CoV-2 virus main protease, at −8.7 kcal/mol, followed by the control (Ritonavir) at −8.4 kcal/mol. As illustrated in  Fig. 7B, a pi-donor hydrogen bond interaction with the terminal benzene ring was also created. Against modeled RNA polymerase-binding transcription factor DksA model protein, Doxycycline showed better binding affinity than ligand numbers 7, 12, and 15 (Table S4). Doxycycline has the maximum negative binding affinity of −7.2 kcal/mol, followed by Ritonavir with −6.7 kcal/mol. Compounds 7, 12, and 15 have a better binding affinity (−6.5 kcal/mol) than the rest of the studied compounds. From (Fig. 7C-E), compound 7 forms two conventional hydrogen bond interactions with the active site residues Pro109 (4.24 Å) and (5.58 Å), it also forms one unfavorable donor-donor interaction with Asp126 (Fig. 7C). Compound 12 forms five conventional hydrogen bonds and two hydrophobic interactions as presented in Fig. 7D. While compound 15 (Fig. 7E) also have 5 conventional hydrogen bonds with Ser21 (2.67 Å), Asp18 (4.23 Å), Tyr19 (5.06 Å), Ser17 (5.27 Å), and Tyr19 (5.44 Å). A carbon-hydrogen bond with Asp18 (4.32 Å) and two hydrophobic interactions with Pro109 (5.37 Å) and Tyr19 (4.8 Å), respectively. Lastly, the control drugs (Doxycycline) have two conventional hydrogen bonds with Ile125 (4.11 Å) and Gly111 (4.17 Å) and two unfavorable donor-donor interactions with Asp126 and Lys113. The unfavorable interactions found in compound 7 and Doxycycline disqualified them for further analysis. Compound 15 (Fig. 7E) has more hydrogen bonds than compound 12; hence, compound 15 was used for molecular dynamics simulations. SwissADME (http:// www. swiss adme. ch/) was employed to estimate the drug-likeness of our inhibitors, including their ADME inside the body [57]. The SwissADME program's Egan BOILED-Egg method was utilized to determine the inhibitors' absorption in the intestinal system and the brain. The BOILED-Egg (Brain Or IntestinaL EstimateD permeation predictive model), also known as the Egan egg, provides a threshold (WLOGP ≤ 5.88 and TPSA ≤ 131.6) as well as a well-defined graphic illustration of how far a chemical structure deviates from the ideal for optimal absorption [58]. In Fig. 8, the molecules in the white part of this 2D graphical representation are predicted to be quietly absorbed by the gastrointestinal (GI) tract, whereas the yolk area represents chemicals that can passively cross the blood-brain barrier (BBB). None of the chemicals are absorbed by the brain, as seen in the graph. The gastrointestinal absorption of all inhibitors was within tolerable limits (WLOGP ≤ 5.88 and TPSA ≤ 131.6) (Fig. 8). The blue dots (compound 5) indicate molecules that P-glycoprotein is predicted to effluate from the central nervous system (CNS), whereas the remaining compounds (red dots) indicate compounds that P-glycoprotein is predicted not to effluate from the CNS. Figure 9 depicts the bioavailability radar of the compounds for six physicochemical characteristics. The bioavailability radars of compounds 15 (Fig. 9A) and 18 (Fig. 9B) demonstrated a quick assessment of druglikeness. The bioavailability radar takes into account the following six physicochemical characteristics: (1) lipophilicity (XLOGP3 between 0.7 and +5.0), (2) size (molecular weight between 150 and 500 g/mol), (3) polarity (total polar surface area between 20 and 1302), (4) solubility (log S less than 6), (5) saturation (fraction Csp3 less than 0.25), and (6) flexibility (the number of rotatable bonds not more than 9). The pink area reflects the optimal range of these traits [59], while the red line shows each compound's properties. In Fig. 9, the in-saturation of both compounds is visible, whereas the other characteristics are inside the pink area. As a result, we can conclude that these chemicals are expected to be bioavailable when taken orally.

The MD simulations of the docked complexes
The MDS was executed to assess the constancy of the docked complexes. The complex stability was investigated by calculating the backbone using rootmean-square deviation (RMSD), root means square fluctuation (RMSF), and solvent accessible surface area (SASA). The RMSD of the Cα atoms in the docked complexes was assessed to see the structural deviations all over the simulation trajectory. The complexes reach their stable state after 1-ns which showed structural stability. The RMSD value of the SARS-CoV-2 protein complex is 2.76 Å and that of the Pseudomonas aeruginosa protein complex is 3.47 Å. As shown in Fig. 10A, the fluctuation of the SARS-CoV-2 protein complex was within acceptable range with RMSD less than 3 Å indicating the stability of the protein complex conformation. The fluctuation of the Pseudomonas aeruginosa protein complex (Fig. 11A) exhibited an increasingly RMSD value toward the end of the simulation. To examine the local differences of protein flexibility, the RMSF results were calculated by taking the average of all backbone residues of atoms ( Figs. 10 and 11B). The changes shown below play a significant role in protein complex flexibility, influencing protein-ligand activity and stability. The high RMSF value demonstrates more flexibility, with a maximum level of fluctuation in the residue positions of 400 ps at 1  (Fig. 11B) (Table 4). MD simulation was applied to confirm the reliability of each ligand into the active site of the enzymes. The fresh identified hit compounds formed stable hydrogen bond interactions with the modeled active residues, e.g., Glu299 and Met6 for SARS-CoV-2 main protease (Fig. 10D) and Tyr19 for RNA polymerase-binding transcription factor DksA (Fig. 11D). The MD simulation also supported that each hit compound formed hydrophobic interactions with residues occupying the active site of SARS-CoV-2 main protease and RNA polymerase-binding transcription factor. Eventually, we proposed two-hit compounds as key practical weapons for the COVID-19 main protease and RNA polymerase therapeutics against SARS-CoV-2 and Pseudomonas aeruginosa inhibition, respectively.

Conclusion
The created 2D-QSAR models' regression statistics demonstrated that they were statistically significant. Furthermore, during fitting's criteria, internal, and external cross-validation trials, relatively low residuals were acquired, showing that the constructed models were predictive. Their satisfactory QL2OO, R2, Q2F1, Q2F2, Q2F3, and CCC values backed up this claim. In docking simulation, compounds 15 and 18 were predicted as the best RNA polymerase-binding transcription factor and SARS-CoV-2 virus main protease inhibitor, respectively (with maximum binding affinity) to be employed as a possible cure orally active drug (based on BOILED-egg and bioavailability radar approach). Molecular dynamic simulations analyze admitting RMSD, RMSF, and SASA analysis affirmed their binding constancy with respective modeled proteins throughout the simulation chronology. Our present exploit can be generative in determining new remedies against SARS-CoV-2 virus main protease and Pseudomonas aeruginosa, having said that general test (in vitro and in vivo) studies are required to test our theoretical analysis.